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Abstract — The objective of the study is estimating probability density functions of arithmetic operations on random 
variables. There are many methods that use specific parametrical and non-parametrical models in order to obtain accurate 
results. However, there are not many studies on speed of convergence and computation complexity of these methods. 

This paper introduces a new method of estimation used to obtain results on bounded random variables. The method is based 
on a new publication provided by Jaroszewicz and Korzen (2012). The algorithm uses numerical analysis techniques such as 
numerical integration and curve interpolation. Author's method is compared to the well-known Monte Carlo method. 
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I. Introduction 

Estimation of probability density function (PDF) has been a topic of many publications [1,2,3]. If we have PDF of random 
variable and want to perform some operations on it, we can use direct approach and calculate these operations as it is shown 
in [4]. Sometimes it is very difficult or even impossible to obtain exact result as an elementary function [5]. That is why we 
usually use techniques such as Monte Carlo method in order to get approximate result. Authors created method that comes 
from direct numerical approximation of integrals. The method is suitable and time efficient when we want to obtain the result 
of basic arithmetical operations on random variables. 

Let $X$ and $Y$ be independent random variables with known PDFs fx(x), ffx). Let g(X, Y) be the function on random 
variables. Our formal task is to obtain approximate PDF of g. Basic arithmetical operations consist of addition, subtraction, 
multiplication, and division. Our method is focused only on finding PDFs of X+Y, X-Y, XY, X/Y. 

1.1 Previous works 

Monte Carlo method is usually a background for numerical techniques [6,7]. To obtain PDF of g we obtain M sets of 
numbers from random generator with distributions X and Y. As a result, we generate M sets of 2 numbers for each set. For 
each of M sets we evaluate g function where arguments are random numbers obtained before. Finally, we have M values of g 
function, based on which we estimate PDF of g(X,Y), using various methods. 

Histogram is one of the most basic and the easiest methods to estimate PDF [1,8,9]. We use M values from Monte Carlo 
method and bin width as an input. Then, we fill histogram bins with appropriate amount of values and rescale them to get 
correct result. The crucial step is to set bin width to obtain the best result in a certain situation. Bin width optimization is 
considered in publications [10,9]. 

Kernel density estimation (KDE) is one of methods used in order to estimate PDF of random variable. Kernel methods were 
introduced by Rosenblatt [11] and Parzen [2]. We use M values and Kernel function to estimate value of PDF in every point. 
Bandwidth is the only parameter and, as in histograms approach, we should use it in a different way, depending on situation. 
Bandwidth optimization is considered in publications [12,13,14]. 

Application of numerical methods is also a common way to estimate PDFs of operations on random variables. Many methods 
are described in the following PhD thesis [15], as well as in another publication connected with the topic [16]. A new method 
using finite elements is described in [17]. Our method is similar to [18], however we focused on bounded random variables. 

Some methods mentioned in the previous paragraphs show that in order to estimate PDF we should use generated random 
samples and evaluate PDF from them. Others show how to estimate PDFs on random variables using Fourier transform, 
Lagrange polynomials and other numerical methods. Authors are not aware of any works related to numerical approximation 
of arithmetical operations based on bounded random variables, as well as any work that focuses on rating rate of convergence 
of such algorithms. 
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1.2 Motivation 

Authors are interested in measurement uncertainty, especially in probabilistic uncertainty. One can model uncertainty in 
several ways such as fuzzy, possibilistic or probabilistic approach. O'hagan presents several arguments why probabilistic 
approach is a valid way to express uncertainty [19]. In his another work, O'hagan and Oakley give arguments why 
"probability is perfect" [20] . 

Uncertain knapsack problem is one of the problems that was a subject of a comprehensive studies in literature using different 
approaches. When constraints in problem are random variables, we get probabilistic knapsack problem. Some formulations 
of probabilistic knapsack problem are described in [21]. To obtain PDF of sum of weights in knapsack problem we have to 
use one of the techniques described in the previous subsection. Monte Carlo method needs large sample in order to be 
adequate. That is why, a need for faster and more adequate techniques currently occurs. 

In many engineering problems, when we obtain two PDFs of random variables, we need to perform some basic operations on 
them. For example, if we observe moving object and evaluate PDFs of distance and time, we may need to evaluate PDF of 
velocity. Our method shows an efficient way to do that. 

In this work, we focus only on bounded random variables. This kind of random variables should be used in many physics, 
economic, decision making, and statistical problems. However, in many cases authors use unbounded random variables, what 
can lead to misinterpretation. For example, if one has water in normal conditions and one use Gaussian random variable to 
model its temperature, it can lead us to the assumption that there is non-zero probability that water can exceed 100 Celsius 
degrees or fall below 0 Celsius degree. Based on laws of physics, we know that it is impossible. 

There are many methods focused on unbounded random variables [15]. These methods usually can also handle bounded 
random variables but they are transformed into unbounded random variables or boundaries are not well approximated. In our 
method boundaries are calculated without any error in approximation. For example, when we have two glasses of water with 
temperatures X and Y (random variables) to obtain the temperature of mixture of these one has to calculate Z=(X+Y)/2. Our 
method assures that if 0<X<100 and 0<Y<100 , then 0<Z<100. 

To the best of our knowledge, there are three methods that are able to deal with bounded random variables and the results of 
the operations are still bounded. The first one is direct approach [4], where we use transformation methods to obtain exact 
result. As it was shown earlier, this approach can be very inefficient. The second one is histogram method [1]. The third one 
is kernel density estimation [11]. This method is in most cases faster than the second one. That is why, in our case we 
compare our method with the KDE method. 

II. Algorithm formulation and evaluation 

2.1 Algorithm description with mathematical background 

Let X,Y be independent, bounded on [a,b] and [c,d] random variables with known bounded PDFs f x ,fy Let g(X,Y) be basic 
arithmetical operation like addition, substraction, multiplication, or quotation. Let h(x) be PDF of g(X,Y). In our case, 
evaluating basic arithmetical operations on random variables can be divided into three steps. 

The first step is to evaluate bounds of the result of basic arithmetical operation. In order to do that, we use interval analysis 
[22]. We evaluate value of g([a,b],[c,d]). For example, when g is a sum we obtain interval [a+c,b+d]. The result of the first 
step is an interval [e,f]. We focus only on operations where [e,f] is finite interval. 

The second step is to evaluate several v from interval [ e,f] obtained earlier. In order to do that, we use numerical integration. 
From [4] we have basic arithmetical operations on random variables: 


fx+yW = 

- t)dt 



Jr 

(i) 

fx-y (. x ) = 

[ /x(0/y(t - x)dt 



' R 

(2) 

fxy(x) = 1 
4 

R f Z^fy{i)]F\ dt 

(3) 
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fx/Y C*0 — I \t\dt 

Jr (4) 

Based on above equations, we can evaluate PDFs using direct, exact mathematical approach. However, in many cases PDF 
of X and Y can be written in a form that is not suitable to use. For example, when PDF is not written as a function but as a set 
of v to y and we do not know the exact formula for PDF, we cannot use these equations directly. Also when PDFs of X and Y 
are complicated functions, it can be hard or even impossible to get exact result. 

Numerical integration is a way to deal with this problem [23]. To be able to use numerical integration we usually need 
integral over bounded domain, i.e. interval. To evaluate definite integral we use trapezoidal method [24]. The result of 
numerical integration is a number. In above equations the results of the integration is the function h(x). That is why, in our 
method we adapt trapezoidal integration. We simply select set of v from interval [e,f] and use numerical integration with 
fixed v. The result of the second step is a set of pairs of arguments and values. 

The third step is to evaluate approximate PDF using interpolating polynomials. In order to obtain smooth function we use 
cubic spline interpolation [25]. The result is a spline function which approximates PDF, hereinafter called as h Numerical (x). 

Our method is a continuation of work made by Jaroszewicz and Korzen [18]. It consists of interval analysis [22] which is 
supposed to provide us with better numerical approximation near interval endings. Outside the interval we will have exact 
result equal to zero. In other methods we usually obtain approximate result close to zero but non-zero. The interval analysis 
is not included in previously mentioned publication and to the best of our knowledge, it was not used in any work regarding 
operations on random variables. 

2.2 Evaluation technique and conditions 

There are many techniques to evaluate PDF estimation algorithms. Comprehensive overview with justifications for each 
technique was made in [26]. We compare our method with kernel density estimator as explained in section 1.2. We call KDE 
function as h KDE (x). To evaluate results we compare root mean square error (RMSE), harmonic average error (HAE), 
geometric average error (GAE), and median error (ME) on randomly chosen points from interval [e,f]. 

In order to evaluate algorithms we provide exact time conditions for each algorithm. For both methods we fix number of 
basic operations and execute algorithms. 

Let n x denotes number of v chosen in the second step of our method. Let d x denotes number of subintervals on which we 
divide integration interval to perform numerical integration. We can call d x as accuracy of numerical integration. We assume 
that we need one basic operation for interval operation in step one, n x d x operations to evaluate all v in step two and n x 
operations to evaluate PDF in step three. Finally, we get the number of basic operations in our method: 1 +n x d x +d x . 

In kernel density estimator method we first perform Monte Carlo procedure. Let N denotes the number of random numbers 
generated from X distribution. N also denotes the number of random numbers generated from Y distribution. Performing 
basic arithmetical operation on numbers from X and Y takes one operation. When we have set of numbers, we get kernel 
density estimator, that is why the total number of operations is: 2N. 

From number of basic operations one can calculate computational complexity of both mentioned algorithms. Time 
complexity of numerical method is: T numerica i(n x , d x ) = 7+ d x n x = 0(d x n x ). Time complexity of KDE method is: 
$T_{KDE}(N) = 2N = 0(N)$. Optimistic, pessimistic and average complexities are the same. In order to compare both 
methods we have to calculate number of operations with fixed error or we have to calculate error measures with fixed 
number of operations. In our paper, we decided to perform the second option. To have similar number of operations one has 
to determine d x ,n x ,N to fulfill the following equation: 

'^numerical (Ax* — ^KDE 0 ^ 

1 + d x n x + n x = 2N 
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III. Simulation 


3.1 Input samples 

Because of lack of tests in literature considering topic of interest, we decided to perform our own tests. We defined four 
PDFs of random variables and performed selected operations on them. Here are the results of aforementioned tests: 

1 x — 1, when x E [1.2) 

3 — x,whenx E [2,3] 

0, otherwise 


stnx 


fx, 00 = 6 


fxSx) = < 


1 

10 ' 

1 

2 ' 

1 

30' 


0, 

x — 1 


10 ' 
fxX*) = { x-6 


10 ' 
1 

10 ' 

4 — x 


when x E [n, 4 n\ 
otherwise 

when x E [5,7) 

whenx E [7,8) 

whenx E [8,17] 
otherwise 

when x E [1,2) 
when x E [2,3) 
whenx E [3,4] 
whenx E [6,7) 


1 
5 

11 -x 


-, whenx E [7,10) 
5 


whenx E [10,11] 
0, otherwise 


( 6 ) 


(7) 


( 8 ) 


(9) 


The choice of the above PDFs is determined by their usefulness and special features. Our goal is to choose functions that 
have been not studied before, except for the first function which is commonly used. We want to study how well algorithms 
can handle continuous and discontinuous functions. fxiM is a standard triangle function, commonly used to model 
uncertainty. Its derivative is not continuous on the peak of triangle, so it can be hard to interpolate it with polynomials with 
degree 2 or more, what is common when we want to integrate the function. fx 2 (x) is a continuous, smooth function, easy to 
integrate. fx 3 (x) is discontinuous and hard to interpolate with polynomials function. fx4x) is two -trapezoidal function with 
some distance between them. 


In the research we consider three basic operations: XjX 2 , X 3 +X 4 , X/X 2 . Obtained boundaries using interval analysis are as 
follows: 


[1,3]*[tt, 4ti]=[3i, 1 2tt] forXiX2 

(10) 

[5,17]+[1,11]=[6,28] for X 3 +X 4 

(ID 

[5,17]/ [ti, 4ji]= [5/47i, 7 /tt] for X 3 /X 2 

(12) 
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The value of the probability density function outside calculated intervals is equal to 0. We perform simulations using our 
method and KDE method with Epanechnikov kernel. 


3.2 Simulation parameters and results 

In order to check the algorithms which are more adequate in certain situations we consider seven sets of simulation 
parameters: 

1. Operations no: 100, N=50, d x =10, n x =9, 

2. Operations no: 1210, N=605, d x =30, n x =39, 

3. Operations no: 1220, N=610, d x =60, n x =20, 

4. Operations no: 10500, N=5250, d x =20, n x =500, 

5. Operations no: 10020, N=5010, d x =500, n x =20, 

6. Operations no: 10100, N=5050, d x =100, n x =100, 

7. Operations no: 100000, N=50000, d x =315, n x =315. 

Our method parameters are specified in above set. We have chosen increasing operations number in order to study 
convergence rate of both algorithms. We have also chosen different n x and d x for similar operations number to find out their 
influence on the speed of convergence. 


KDE method will be considered with Epanechnikov kernel. This kernel minimalizes mean squared error and ensures that 
obtained random variable is still bounded after performing an arithmetic operation. Bandwidth associated with kernel is 
calculated with Mathematica software [27]. To perform the evaluation of methods we apply uniformly random 100 points 
from the interval [e,f]. We call this set as R x . Next, for each xcR x we evaluate h(x). In order to do that, we use numerical 
integration, as it was done before with d x = 10000. Having sets of h Numerical (x), Iikde(x) and h(x) when xcR x , we can evaluate 
errors for each x using following equations: 


^ Numerical 0*0 I ^ Numerical 0*0 h(pc) \ 


&KDe(. X ) — I h-KDEix) — fr(X)l 


(13) 

(14) 


Then, for each x we evaluate error measures RMSE, HAE, GAE, and ME using proper definitions. 

We execute simulation using parameters and input samples as it was done in the previous subsections. The results are 
gathered in tables 1, 2, and 3. 


Table 1 


Set no. 

Method 

RMSE 

HAE 

GAE 

ME 

1 

KDE 

8,74e-3 

6,09e-4 

3,52e-3 

8,59e-3 

1 

Numerical 

9,34e-4 

6,21e-5 

4,45e-4 

5,89e-4 

2 

KDE 

5,36e-3 

2,09e-3 

3,45e-3 

5,17e-3 

2 

Numerical 

l,75e-4 

l,81e-5 

7,12e-5 

9,13e-5 

3 

KDE 

6,52e-3 

l,14e-3 

3,09e-3 

5,2e-3 

3 

Numerical 

l,61e-4 

l,22e-5 

3,06e-5 

3,52e-5 

4 

KDE 

l,72e-3 

6,39e-4 

l,17e-3 

l,59e-3 

4 

Numerical 

2,27e-4 

4,79e-5 

l,31e-4 

l,78e-4 

5 

KDE 

2,63e-3 

l,93e-3 

2,24e-3 

2,53e-3 

5 

Numerical 

5,43e-5 

l,89e-6 

6,54e-6 

4,67e-6 

6 

KDE 

2,57e-3 

l,44e-3 

l,99e-3 

2,16e-3 

6 

Numerical 

4,7e-4 

5,21e-7 

6,85e-6 

8,92e-6 

7 

KDE 

l,24e-3 

4,32e-4 

9,25e-4 

l,2e-3 

7 

Numerical 

l,24e-6 

l,48e-7 

6,08e-7 

8,39e-7 
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Table 2 


Set no. 

Method 

RMSE 

HAE 

GAE 

ME 

1 

KDE 

l,83e-2 

3,04e-3 

8,2e-3 

l,02e-2 

1 

Numerical 

2,37e-2 

2,28e-4 

7,88e-3 

l,04e-2 

2 

KDE 

7,22e-3 

9,94e-4 

2,59e-3 

3,94e-3 

2 

Numerical 

8,63e-3 

3,79e-5 

l,19e-3 

l,93e-3 

3 

KDE 

6,13e-3 

3,14e-3 

4,22e-3 

4,2e-3 

3 

Numerical 

4,25e-3 

2,67e-4 

l,07e-3 

8,54e-4 

4 

KDE 

3,68e-3 

3,6e-4 

l,7e-3 

2,3e-3 

4 

Numerical 

l,le-2 

2,68e-5 

2,88e-3 

2,88e-3 

5 

KDE 

2,26e-3 

4,86e-4 

l,17e-3 

l,24e-3 

5 

Numerical 

2,19e-3 

l,38e-4 

5,76e-4 

7,lle-4 

6 

KDE 

3,8e-3 

4,4e-4 

l,2e-3 

l,17e-3 

6 

Numerical 

2,2e-3 

5,86e-5 

6,25e-4 

7,24e-4 

7 

KDE 

l,83e-3 

6,27e-5 

7,21e-4 

6,63e-4 

7 

Numerical 

4,08e-4 

4,65e-7 

7,88e-5 

l,31e-4 


Table 3 


Set no. 

Method 

RMSE 

HAE 

GAE 

ME 

1 

KDE 

l,06e-l 

5,83e-4 

l,71e-2 

2,83e-2 

1 

Numerical 

l,18e-l 

9,91e-4 

l,26e-2 

6,62e-3 

2 

KDE 

l,02e-l 

3,75e-4 

5,32e-3 

5,67e-3 

2 

Numerical 

7,le-2 

2,82e-5 

2,17e-3 

l,36e-3 

3 

KDE 

l,05e-l 

l,56e-5 

5,2e-3 

6,54e-3 

3 

Numerical 

1, 01e-l 

2,22e-4 

l,66e-3 

9,13e-4 

4 

KDE 

5,93e-2 

4,06e-4 

3,95e-3 

4,65e-3 

4 

Numerical 

3,65e-2 

7,87e-5 

2,72e-3 

2,85e-3 

5 

KDE 

7,08e-2 

3,03e-5 

4,46e-3 

4,03e-3 

5 

Numerical 

6,34e-2 

6,16e-5 

3,32e-4 

l,79e-4 

6 

KDE 

7,49e-2 

6,25e-4 

4,8e-3 

4,55e-3 

6 

Numerical 

7,73e-3 

7,74e-5 

8,le-4 

4,65e-4 

7 

KDE 

3,07e-2 

2,25e-4 

l,36e-3 

l,23e-3 

7 

Numerical 

l,98e-3 

l,34e-5 

l,93e-4 

l,79e-4 


IV. Result & Discussion 

Based on analysis of data included in the tables, we observed that both in KDE and numerical methods all errors are 
inversely related to the number of operations. In case of KDE method we know that it is convergent to real PDF [11]. We 
also observe that errors are decreasing faster in case of numerical method than KDE method. In nearly all cases when number 
of operations is equal we can notice that numerical method error is smaller than KDE error. 

From observations based on above data, we can assume that numerical method is converging to real PDF faster than KDE 
method. Numerical method is also deterministic method in contrast to indeterministic KDE method. That is why; no 
dependency on correct pseudorandom numbers generator has been notified. 

The general conclusion is that different cases applied in both methods converge with different speed. The possible 
explanation of this phenomenon can be variability of obtained PDF. The more obtained PDF variable is, the higher all error 
measures should be. 
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4.1 Convergence rate discussion 

Based on our observations, both methods assure that all error measures converges to zero. Rate of convergence of numerical 
method for RMSE is faster than l/ln(n) and slower than 1/n. There are two main points where numerical errors arise: 
integration and interpolation. 

Integration error arises as we perform convolution step for each selected point. For trapezoidal method, overall error depends 
on integration interval length f-e , number of divisions of interval d x and second derivative of resulting function. The exact 
formula for integration error for trapezoidal method is as follows: 

d x ~ 1 (f ~ X )3 

Err int (d x ) = ^ ^ — /"(**) 

i=0 (15) 

where x t is unknown point in interval [e + 1 — - r € + (i + 1) • Rate °f convergence depends on second derivative 

d x d x 

and interval length. Integration error convergence rate matches following equation: Err i7lt (d x ^) < 0(— )• As we have n x 

d x 

points to calculate, overall error convergence rate matches the equation: Ett Ad 71 1 < 0 (— V One can use other 

OV . 2 77 £ V X - f X J V ^ 2 

integration methods which can provide better results. For example, Gauss quadrature and Clenshaw-Curtis quadrature in 
many cases have 0 (-r— ) convergence rate [28]. 

Interpolation error arises as we perform interpolation step using cubic spline. From [29] we know that interpolation error 

convergence rate matches the following equation: Err^ p[ < 0 Obtaining overall error convergence rate upper 

n x 

boundary depends on exact problem formulation as integration error and interpolation error are dependent. Based on the 
observations, when n x =d x we find out that overall error convergence rate is between Q f ') and Q / 1 y Convergence 

I n d x n x 

rate drops when n x ^ d x , however the exact boundary and dependence is unknown. 

V. Conclusion and future works 

Authors developed the method to perform basic arithmetical operations on random variables. In comparison to well-known 
and commonly applied KDE method, it is faster and more adequate in simulations. The method is in prototype phase and 
authors are willing to put further effort in order to develop it, what is to minimize faster error measures. The results of our 
study are discussed focusing on convergence rate observed from results of the simulation. 

In further work, authors want to examine the best relationship between integration accuracy- d x and number of chosen points- 
n x . The method should give better result when we know how to choose points v and we apply integration methods in a more 
effective way. Authors also want to compare KDE and numerical method with different kernel estimators for bigger amount 
of samples. Authors are also willing to develop method to estimate PDFs of more complex operations on random variables. 
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